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MULTIPLICATIVE FREE CONVOLUTION AND 
INFORMATION-PLUS-NOISE TYPE MATRICES 

By 0YVIND RyainP and Merouane Debbah 
University of Oslo and Institut Eurecom 

Free probability and random matrix theory has shown to be a 
fruitful combination in many fields of research, such as digital com- 
munications, nuclear physics and mathematical finance. The link be- 
tween free probability and eigenvalue distributions of random matri- 
ces will be strengthened further in this paper. It will be shown how 
the concept of multiplicative free convolution can be used to express 
known results for eigenvalue distributions of a type of random matri- 
ces called Information-Plus-Noise matrices. The result is proved in a 
free probability framework, and some new results, useful for problems 
related to free probability, are presented in this context. The connec- 
tion between free probability and estimators for covariance matrices 
is also made through the notion of free deconvolution. 

1. Introduction. Applications of free probability have been growing 
rapidly over the last years. Random matrices and their limit eigenvalue 
distributions is an area where free probability has proved to be useful 
Random matrices are a useful tool for modelling systems, for instance in 



digital communications [l9j, |20[], nuclear physics [g, l8|] and mathematical 
finance \2\. This paper is a contribution to the random matrix facet of free 
probability, in that the connection between certain random matrices and free 
probability is clarified further. We will focus on what we call Information- 
Plus-Noise Type Matrices, i.e. random matrices on the form 

(1.1) Wn = ^(i?n + aXn){Rn + CtX„)*, 

where Rn and Xn are independent random matrices of dimension n x N. 
These can be thought of as sample covariance matrices of random vectors 
r„ -|- axn, where r„ can be interpreted as a vector carrying the information 
in a system, and Xn additive noise, with a the strength of the noise. We 
impose no assumption on independence between samples. We will use some 
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common restrictions on the noise: X„ will contain i.i.d. complex entries of 
unit variance, n and will be increased so that 

Tl 

(1.2) lim — = c. 

In Dozier and Silverstein explain how the limit eigenvalue distribution 
fiw of the matrix Wn can be found, based on knowledge of the limit eigen- 
value distribution /ir of the matrix r„ = The result is expressed 
in terms of a solution to a function equation (equation (j4.ip ). We will show 
that there is an equivalent way of expressing this solution, using the concept 
of multiplicative free deconvolution, denoted by □ (multiplicative free convo- 
lution, as well as freeness and asymptotic freeness are defined in section [2]). 
The following is the main result of the paper: 

Theorem 1.1. Assume that the entries Xfj of Xn are Gaussian, inde- 
pendent and identically distributed with expectation and variance 1. As- 
sume also that the empirical eigenvalue distribution of r„ = -^iJ^i?* con- 
verges in distribution almost surely to a compactly supported probability mea- 
sure fir- Then we have that the empirical eigenvalue distribution ofWn also 
converges in distribution almost surely to a compactly supported probability 
measure nw uniquely identified by 

(1.3) flw H/"c = (^r H/Uc) ffl M(t27- 

Some remarks are needed to explain theorem ll.il By the empirical eigen- 
value distribution of an n x n random matrix X we mean the (random) 
atomic measure 

- (SiXiiX)) + ■ ■ ■ + 5{XniX))) , 
n 

where Ai(X), A„(X) are the (random) eigenvalues of X. That con- 
verges in distribution to ^ means that the moments of fin converge to the 
moments of /U. Theorem 11.11 requires compactly supported measures, and 
these have moments of all orders. 

The conditions in theorem 11.11 are somewhat stronger than those in JS] 
due to the restriction to measures with compact support. Contrary to [3|], 
we also restrict to noise-matrices with Gaussian entries. 

Theorem 11.11 yields a short expression for fi\\r, removing the need for 
solving the equation in [3| directly. It essentially says that the connection 
between fxw and //r can be expressed compactly in the deconvolved domain^ 
where the connection can be viewed as a shift of the spectrum with the 
noise variance o"^ . The proof of theorem 11.11 is based on methods from free 
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probability, with some new results established along the way. Some of these 
deserve extra attention, in particular theorem 13. 4[ This can be thought of as 
a version of theorem 11.11 where fiw and fir are interpreted as distributions 
of free random variables. 

Theorems 13.21 and 13.31 also deserve some extra attention. These address 
asymptotic freeness almost everywhere |7] for two random matrices where 

1. both converge in distribution almost everywhere to compactly sup- 
ported limits, and 

2. one of the random matrices are standard unitary (theorem 13. 2p or 
Gaussian (theorem 13. 3p . 

These results expand known results from [2] for asymptotic freeness. The 
proofs of theorems 13.21 and 13.31 use random matrix approximations with 
deterministic matrices. Asymptotic freeness of Gaussian/standard unitary 
random matrices and uniformly norm-bounded deterministic matrices are 
well-known (lemma 4.3.2 in [tI]). Unfortunately, norm-bounded deterministic 
matrices are not able to approximate the random matrices under consider- 
ation. We solve the problem by generalizing to matrices satisfying uniform 
II • llp-norm bounds instead, where || • ||p dentotes the Schatten p-norm (with 

respect to tr„), defined for p > 1 by \\A\\p = trni\A\P)p {A £ M„(C)): We 
prove that matrices satisfying such bounds can be used to approximate our 
random matrices, and that they also give asymptotic freeness as in lemma 
4.3.2 in (theorem El]). 

Theorem ll.ll is actually proved by combining theorems l3.3l and l3.4l through 
another approximation argument (see theorem [33]). While [3] restricts to the 
distribution of jf{Rn + <^Xn){Rn + crXn)*, we show more in that any mixed 
moments of and jj^XnX* are obtained through our asymptotic free- 

ness results. 

Recent works [l3, [3] show that multiplicative free convolution also ad- 
mits an efficient implementation in terms of the moments of the operand 
measures. The basic results on free probability we need for this are proved 
in this paper (theorems 12.11 and 12. 2p . A consequence is that existing compu- 
tational frameworks can be used in obtaining fir and fiw In [3]! A^r and 
IJ,w are illustrated in terms of signal processing applications, and simula- 
tions are run using a computational framework building on theorems 12.11 
and 12.21 A useful consequence of the link with free probability is that that 
the "inverse problem" (i.e. that of finding /xp from fiw) can be solved within 
the same framework, since the framework embraces convolution as well as 
deconvolution. 

The eigenvalue distribution of r„ provides us with possibilities for estimat- 
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ing the covariance matrices of the system through the so-caUed G -estimators 
These will be reviewed, and it will be shown how multiplicative free con- 
volution can be used to rewrite such estimators to a very simple form. It 
will be apparent from this that the G^-estimator actually can be viewed as 
a step in expressing from fir- 

While the results mentioned here are hard to prove, some of them should 
should come as no surprise. For instance, 1J| has already made the connec- 
tion between Information-Plus-Noise type matrices and multiplicative free 
convolution. This paper also indicates that some of the mentioned results 
are already known, by saying that random matrices with Haar-distributed 
eigenvectors are asymptotically free from any random matrices independent 
from them. However, the generality in which this should hold is not indi- 
cated. Also, ij] considers only Gaussian matrices, and the connection with 
already existing estimators of covariance matrices was not made. 

This paper is organized as follows. Section [2] contains notation and pre- 
liminaries for various free probability tools, like free transforms and combi- 
natorial aspects. The mentioned implementation of free convolution builds 
on the combinatorial expression of freeness, and the results needed on this 
are explained in section 12.11 The proof for theorem 11.11 is presented in sec- 
tion [3l A sketch of the proof is first given, followed by the proofs for theo- 
rems 13. H 13.2113.31 and 13.41 Section S] first states the results we need from 0] , 
and sketch the proof for the equivalence of these and theorem II. 1[ This 
sketch is then followed by the rest of the details. The various transforms 
used in free probability (section [2]) are used in this direction. In section [5] we 
state the principles of G-analysis and the expression for the G'^-estimator. 
We also prove the theorem which expresses the G^-estimator in terms of free 
probability. 



2. Notation and preliminaries. In the following, uppercase symbols 
will be used for matrices, and (.)* will denote hermitian transpose. /„ will 
represent the identity matrix of order n. We will focus here on certain non- 
commutative probability spaces. A noncommutative probability space is a 
pair (A, (j)) where A is a unital *-algebra and (/> is a normalized (i.e. (t){I) = 1) 
linear functional on A. The elements of A are called random variables. The 
probability spaces we will encounter are mostly (M„(G),fr„), i.e. n x n- 
matrices equipped with the normalized trace. Any matrix can be associ- 
ated with a probability measure through it's eigenvalue distribution. We 
will mostly be concerned with probability measures with compact support. 

Definition 2.1. A family of unital *-subalgebras {Ai)i^i will be called 
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a free family if 



(2.1) { i2,i2 is,- ■ ■ ,in-i in } ^ ct>{ai ■ ■ ■ ttn) = 0. 

(j){ai) = 4>{a2) = ■■■ = (j){an) = 



(j2.ip enables us to calculate the mixed moments of ai and 02 when they 
are free. In particular, the moments of ai + 02 and 0102 can be calculated. 
This gives us two new probability measures, which depend on the proba- 
bility measures of oi, 02 only (i.e. not on their realizations). Therefore we 
can define two operations on the set of probability measures: Additive free 
convolution 

(2.2) ^llm^i2 

for the sum of free random variables, and multiplicative free convolution 

(2.3) ^llM^l2 

for the product of free random variables. 

Let F^^ denote the empirical distribution function (e.d.f.) of the eigen- 
values of A (so that F^^ (x) is the proportion of eigenvalues of A which are 
< x). When we have a series of e.d.f.'s F^^"- , we will use the notation 

for weak convergence, where F^^ is the cumulative distribution function of 
the measure 11. We will also write a.s. as shorthand notation for almost sure 
convergence. 

Some random matrices and limit distributions occur naturally in many 
contexts. If the entries of the n x N (with lim^^oo ;^ = c) random matrices 
Wn have zero mean and unit variance, the empirical eigenvalue distribution 
of jfWnW* converges almost surely to the so-called Marchenko Pastur law 



A*c ([2l[ page 9). These are also called the free Poisson distributions, and are 



characterized by the density 

(2.4) /-(x) = (l-V5(x)+^^"-"^"^'- 



c 27rcx 

where {z)~^ = max(0, z), a = (1 — \/c)~^ and a = (1 + -v/c)"*". Similar notation 
to the Marchenko Pastur law is used for the distribution of a random vari- 
able a. We avoid confusion by never using c to denote random variables, fii 
will always mean the Marchenko Pastur law with parameter one. Marchenko 
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Pastur laws are some of the most basic random matrix building blocks, as 
they appear as limits for large random matrices in many contexts. This pa- 
per will demonstrate that this is indeed the case for the type of systems we 
consider also. 

We will not use the characterization of the Marchenko Pastur law as in 
(|2.4p directly. Rather we will work with equivalent expressions of it through 
the transforms defined in this section. The transforms we define will only 
be applied for probability measures with support contained on the positive 
real line. 

The Stieltjes transform ([2l| page 38) of a probability measure is the 
analytic function on C"*" = {z £ C : Qz > 0} defined by 

(2.5) m^{z) = r -J—dF^'iX). 

J — oo A — Z 

A convenient inversion formula for the Stieltjes transform also exists, so 
that uniquely identifies fi. If /x is assumed to have nonnegative support, 

can be analytically continued to the negative part of the real line. If 
/i = fix for a non- negative random variable X, is strictly monotone on 
the negative real line, taking values in the interval [0,E (x)]- ^^^^ 
the fact that if we know m^(2;) in an interval {—z, 0) for z < 0, we also know 

for all other values of z, and hence we also know /i (use the Stieltjes 
inversion formula). 



The r]-transform ( 2l| page 40) is defined for measures fi with support on 



the positive real line, and for nonnegative real numbers by 

(2.6) V,iz) = r T^dF^^W- 

rj(z) is a strictly monotonically decreasing function. As such it simplifies 
many derivations and statements of results. The inverse is tightly connected 
to the S'-transform (see below). It's connection with the Stieltjes transform 
is 

(2.7) r/^(z) = — m^(z) = 

Therefore tj^iz) uniquely identifies m^{z), since mfj_{z) for real, negative z 
can be continued analytically to . We will use the fact that if we know 
rjfj,{z) in an interval (0^) for z > 0, we also know fi. 

The R-transform ( [2l[ | page 48) has domain of definition C"*" and can be 
defined in terms of the Stieltjes transform as 

(2.8) 7^^(^) = ^-l(_^)_i. 
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The importance of the i?-transform comes from it's additive property for 
the distribution of the sum of free random variables Ai and A2 , 

(2.9) {z) = 7^^„^ (z) + (z). 

Shghtly different versions of the i?-transform are encountered in the htter- 
ature. The one above is from [21]. In connection with free combinatorics, 
another definition is used, namely R^{z) = zlZ^{z). Of course, R^{z) also 
satisfies ([23]). 

The S-transform ([2lt] page 50) is defined on (—1,0). It can be defined in 
terms of the ry-transform by 

(2.10) s,{z) = -'-^ir,\z + l). 

The Marchenko Pastur law (j2.4p can be shown to have S-transform S^^{z) = 
xqr^ ([2H page 51). The importance of the 5-transform comes from it's 
multiplicative property for the distribution of the product of free random 
variables ai and 02: 

(2.11) S^^^^^{z) = S,^^{z)S^^^{z). 

If the values of ?7^(z) or S^{z) are known in an interval, one also knows /i. 

Preeness, additive and multiplicative free convolution have a combinato- 
rial description involving these transforms which we will use for in some of 
our proofs. These combinatorial descriptions build on the concept of non- 
crossing partitions: 

Definition 2.2. A partition it is called noncrossing if whenever we have 
i < j < k < I with i ^ k, j ^ I meaning belonging to the same block), 
we also have i ^ j ^ k ^ I (i.e. i,j,k,l are all in the same block). The set 
of noncrossing partitions 0/ {1, , , ., n} is denoted NC{n). 

NC{n) becomes a lattice under the refinement order of partitions. An 
ingredient we need in making the connections between freeness and the 
noncrossing partitions is the complementation map of Kreweras, which is 
a lattice anti-isomorphism of NC{n). To define this we need the circular 
representation of a partition: We mark n equidistant points l,...,n (num- 
bered clockwise) on the circle, and form the convex hull of points lying in 
the same block of the partition. This gives us a number of convex sets Hi, 
equally many as there are blocks in the partition, which do not intersect if 
and only if the partition is noncrossing. Put names 1, n on the midpoints 
of the 1, n (so that i is the midpoint of the segment from i io i + 1). The 
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Fig 1. The circular representation of a partition of {1, 16}. 



complement of the set Uji/j is again a union of disjoint convex sets Hi. All 
this is demonstrated in figure [TJ where the Hi are the scrambled areas with 
dashed borders, the Hi are the scrambled areas with non-dashed borders. 
We will refer to this figure heavily during the proof of theorem 13.41 We can 
now define the Kreweras complementation map: 

Definition 2.3. The Kreweras complement o/vr, denoted K{'k), is the 
partition on {l,...,n} determined by 

i ^ j in K{'k) <^=^ i,j belong to the same convex set H^. 

The connection between the i?-transform and noncrossing partitions comes 
through the moment- cumulant formula, which relates the moments and the 
/^-transform coefficients (also called cumulants) for the distribution of a 
random variable. 

Lemma 2.1. Write the R-transform as a power series, R^^ [z) = otnz'^- 
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Then 

k 

(2.12) <l>{an= E n«|s.h 

TV={Bi,- ,Bk}eNC{n) i=l 

This can be used as an alternative definition of the i?-transform. We also 
need to define the multidimensional i?-transform for the joint distribution of 
a sequence of random variables. Denote hy C{zi, Zm) the space of complex 
power series in m noncommuting variables Zi with vanishing constant term. 
These can be written in the form 

k>l h,...,ik 

In referring to the coefficients of a power series / on this form we will write 

[coef{ii,...,ik)]{f) = aii,...,jj^, 

and if TT = {Bi, ...,Bm}, 

[coef{ii,...,ik)\Bi]{f) = a(j.)^^^^ 
[coef{ii,...,imy,TT]{f) = U.i[coef{ii,...,ik)\Bi]{f). 

For power series in one variable, the coefficients will also be written in the 
form [coefk]{f)- For n random variables ai, a„ we define their joint mo- 
ment series as the power series M^^^ G C{zi, ...,Zn) such that 

A^Mai,...,a„(^i,-"^^fc) = X! J2 (t){ah---aiJ)Zi^---Zi^, 

m>l n,...,im 

and we define their joint i?-series as the unique power series Rij,ai,...,a„ ^ 
C{zi, ...,Zn) such that 

(2.13) Hail ■■■ aim) = [coe/(n, ...,im);7r](i?^,^ . _,„). 

■n-eNC{m) 

The result we will use connecting the joint i2-series and freeness is the fol- 
lowing: 

Lemma 2.2. ({oi, a„}, 6^}) is a free family if and only if 

-^/"ai,...,o„,6i,...,6m (^1' ^n+m) 

~ -^Mai,...,a„ (-^1' ■■■■> ^n) + Rnbi,...,bm i^n+lj ■•■) ^n+m)- 
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This lemma is often summarized by saying that the joint i?-series of free 
random variables has no mixed terms. A special form of (j2.13p and lemma [2?2] 
we will use is the following: If (01,02) is a free family, and a mixed term 
fljj • • • is given, form the partition with two blocks a = {(Ti,o"2}, where 
o-fc = {j\aij = k}. Then 

(2.14) (j)(ai^ . . . aij = ^ [coef{ii,...,im);ir]{R^^^^^J. 

TT<(7eNC(m) 

Our combinatorial connection with multiplicative free convolution can be 
made complete with the help of the following definition 10[| , Hi : 

Definition 2.4. Given two power series f and g, their boxed convolu- 
tion /Qg is defined by 

[coe/(ii, ...,irn)]{f^) 

(2.15) = [coef{ii,...,irn);'n-]{f)[coef{ii,...,irn);K{'ir)]{g). 

■K&NCim) 

Boxed convolution is commutative only on power series in one variable [l^ . 
It satisfies the associative law, but not the distributive law. It does not sat- 
isfy linearity properties w.r.t. scalar multiplication. However, the following 
holds and will be useful to us: 

[coefn] ((c/)g(cff)) = [coefn] (c"+H/0<7)) 

and 

[coefn] {mad)) = [coefn] (c'^f) . 

Here we used the shorthand notation c"/ for the power series defined by 
[coe/n](c"'/) = c"'[coe/n](/). The first statement is easily proved using the 
fact that |7r| + |i^(7r)| = n + 1 for any n £ NC{n) [13|. The second statement 
is trivial. The following result holds for multiplicative free convolution [l^ : 

Lemma 2.3. // ({ai, ...a„}, {61, 6„}) is a free family, then 

(2.16) Ra , , =Ru„ n R^M^ ^ 

One can also define additive and multiplicative free deconvolution in most 
cases, i.e. finding ^2 in (|2.3p when /ii M ^2 are known. 

Definition 2.5. Given probability measures ^ and ^2- When there is 
a unique probability measure fii such that fi = fii M fi2, we will denote 
^1 = ^□/i2- We say that fii is the multiplicative free deconvolution of n 
with fi2- 
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We can define addtive free deconvolution similarly. Note that free deconvo- 
lution is defined only for a subset of all probability measures, since measures 
exist which can't be expressed on the forms fii ffl fi2 or fiiM fi2- Deconvo- 
lution can, however, also be viewed as a formal operation on a sequence of 
moments. Viewed as such, multiplicative free deconvolution is well-defined 
when we have non- vanishing first moments. This can be seen from the com- 
binatorial description of multiplicative free convolution (j2.15p . In hght of 
(I2.16p . it is obvious from ()2.15p that the cumulants in R^^ can be calculated 
recursively from those of R^-^ and R^^mfM2i when the first coefficient of R^^ 
(which equals the first moment) is known. Since the main theorem relates 
to the moments of the involved measures (it is a statement on convergence 
in distribution), we will in the following view deconvolution in terms of the 
moments only. 

A form of (I2.16P which will be useful to us is for the case n = 1. If we 
write fia = i^J■aS^J.c) fic, we get 

(2.17) i?M.HMc=i?M.0^;J- 

The facts we will use concerning boxed convolution are the following, re- 
lating moment series, i2-series of general random variables (in particular 
projections and free Poisson random variables), the Zeta series, which is 
defined as 

Zeta{zi, ...,Zn) =Y^ ^ Zi^---Zi^, 

k ii,---,ik 

the Moeb series (which is the inverse of Zeta under composition with |*|) 
and Id (which is the unit under composition with |*|): 

= RfJ^Zeta and R^, = Mf^^Moeb 
^ ■ ^ = cZeta and Rf,^ = d'-^Zeta. 

Here p is a projection with (t){p) = c. Our definition of /Xc differs from that 
of 0], for purposes of compatibility with 14]. Consequently, the expres- 
sions for the /^-transforms are different. In the terminology of [7|, the i?-series 
would be cZeta. The following definition [l^] will also be in use: 

Definition 2.6. A pair {a,b) of noncommutative random variables is 
called an i2-diagonal pair if it's R-series is of the form 

oo 

(2.19) R,^M^Z2) = E«n((^1^2)" + (^2^l)"). 

n=l 

An element a will be said to be an i?-diagonal element if {a, a*) is an R- 
diagonal pair. The one-variable series J2'^=i o^nz"' will be called the deter- 
mining series of the R-diagonal pair (a, b). 
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We will use the fact that if a is an i?-diagonal element, it's determining 
series can be written as i?^^^»QMoe6 0] • Two important i?-diagonal elements 
are 

1. the Haar unitary, which can be defined as a unitary u satisfying 
4>{u'') = for ah n e Z / 0, and 

2. the circular element, which can be defined as an element s whose *- 
distribution Hs^s* satisfies R^^ ^, (zi, Z2) = ziZ2 + Z2Z\. 

The concept of ii-diagonally was in fact invented in search of a common ap- 
proach for Haar unitaries and circular elements Haar unitaries are very 
important in asymptotic random matrix results. In VF*-probability spaces, 
when the isometric part of an i?-diagonal element has kernel equal to zero, 
the isometric part is actually a Haar unitary. 

2.1. Implementation of free convolution. While free convolution has an 
abstract definition, the combinatorial description given in this section can 
actually be used to obtain an efficient implementation. In many practical 
cases, free convolution with /ic is what we are interested in. Such free con- 
volution is simplified through the following result. 

Theorem 2.1. 

(2.20) {cM^)gZeta = c (M^^^J . 

Proof. To see this, start by combining (|2.16p with (j2.18p to get 
i^M^Mc = ^mB^mc = i?M0(c™"^Zeia). 
After convolving both sides with Zeta, we get 

(2.21) M^K^^ = M^^{c"'~^Zeta). 
To prove (j2.20p . rewrite the left hand side as 

J2 c\^\[coef^;^]M^. 

■K&NC{m) 

Since |7r| + |-R'(7r)| = m + 1, this equals 

cE.eTvcM \^oefm-M (M^)c™-I^WI 
= cE.eJVCM [coef^; vr] (M^)c— I^WI [coef^; ^(vr)] {Zeta) 
= cJ2neNC(m) [coefm] vt] {Mf,)c"' {coefm\ i^(vr)] icr^Zeta) 
= cJ2neNC(m) [coefm; tt] (M^) [coc/m; K{7r)] {c"^-^Zeta) 
= c(M^g(c™-iZeta)), 

substituting (j2.2ip proves the claim. □ 
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In summary, if we need to compute the moments of fxM fic, one can first 
compute the moment series cM^, then use this to compute the left hand 
side of (12:2(1 . According to ([2:21]) and I^^TQh . the moment series of /i Kl /i^ 
can then be computed from this with an additional scaling with i. 

In other words, convolving with /Xc is equivalent to convolving with /Ui 
(with additional scalings of power series taken into account), since R^^ = 
Zeta. It turns out that boxed convolution with Zeta is easy to compute, 
as the following result shows. The result is stated in terms of the moment- 
cumulant formula, since the relation between cumulants and moments are 
given by boxed convolution with Zeta. 

Theorem 2.2. 

m 

(2.22) [coe/„](M^) = ^[coeA](i?^)[coe/„_fc](l + M^)^ 

k=l 

Proof. For each vr € NC{m), fix the block Bi = {hi, ...,bik} in vr 
containing 1, and let NC{m, Bi) be the set of all noncrossing partitions 
which contain Bi as a block. Rewrite the definition of boxed convolution 
(pT5]l to 
(2.23) 

[coe/m](M^) = J2BiJ2n€NCim,Bi)[coefni;TT]{Rf,)[coefm;K{TT)]{Zeta) 
= Ebi E,re7VC(m,Bi)[coe/m;vr](i?^). 

Blocks in vr G NC{m, Bi) other than Bi must be entirely contained in one 
of {bu + 1, bi2 — 1}, {bik + 1, bii — 1}. This means that the inner 
summand in p.23p can be rewritten to 

(2.24) [coeh]{Rf,)l[( ^ [coe/„; ^](i2^) | . 

i=l V7re7VC(fei(,+ i)-6H-l) / 

From the moment-cumulant formula it is seen that each sum here is simply 
a moment, so we can rewrite to 

k 
i=l 

where the summand 1 in 1 + accounts for elements i in ()2.24p with 
^i(i+i) = + 1 (i-6- consecutive elements in a block). All in all, (j2.23p can 
be rewritten to 

k 

(2.25) E [coeh]iR,)l[{[^oeh^i.+.,-bu-i]a + M,)) 

k Si i=l 
\Bi\=k 
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Write tti = — bii — 1, and note that 

k k 

H Oi = ^ (bi(i+i) - hi - ij = m - k. 
1=1 i=i 

The tti are in one-to-one correspondence with all candidates for Bi, so that 
we can rewrite (|2.25p to 

k 

k ai,....a^. i = l 

ai=m—k 

The inner sum here is easily recognized as coefficient m — k m the power 
series (1 + M^)^ (one factor for each Oj). Putting things together we get 
([222]). □ 

In (j2.22p we see that there is no reference to noncrossing partitions. (12.221) 
can be used easily in calculating moments recursively from cumulants. The 
coefficients in the power series [l + M^)^ can be computed in terms of /j-fold 
(classical) convolution. This is done in [l^, where many multiplicative free 
convolutions are computed based on (j2.22p . The actual implementation of 
()2.22p used in p^] is contained in [16] . 

Free convolution as introduced here is just defined for compactly sup- 
ported probability measures. 

3. Proof of theorem 11.11 In what follows we first sketch the proof of 
theorem 11.11 After this follows proofs for theorems needed in the proof. 

First we prove the following variant of lemma 4.3.2 in [t^, which can be 
used together with the Borel-Cantelli lemma to prove almost sure conver- 
gence. It is slightly more general in the sense that boundedness in the oper- 
ator norm || • || is not assumed, only boundedness in || • for p > 1. This 
weaker boundedness assumption is needed since uniformly norm-bounded 
matrices are not sufficient to approximate all compactly supported proba- 
bility measures almost surely. Recall that an n x n unitary random matrix 
is called standard unitary if it's distribution equals the Haar probability 
measure on U{n). 

Theorem 3.1. Let U{s,n)s,zs be an independent family ofnxn standard 
unitary random matrices. Let si,...,si G S, mi,..., mi £ Z \ {0}, and let 
Rp ^ 0, p > 1 be constants. Then 

(3.1) E (jtrn {U{si,nr^Di{n)U{s2,nr^D2{n) ■ • • U{si,nr^ Di{n)) 
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is 0{n ^) as n oo uniformly for the choice of any Dj.{n) G Mn{C) 
(1 < r < I) such that for 1 < r < I either 

trn{Dr{n)) = and \\Dr{n)\\p < Rp {n e N) 

or 

Dr{n) = In {n £ N) and Sr / Sr+i( with si^i = si). 

Also, for a given I, there exists a pi such that the same statement holds as 
long as the \\ ■ \\p-norm bounds are satisfied for p < pi only. 

The proof is in section 13.11 It somewhat simphfies the proof of lemma 
4.3.2 in 0], and can also be used to simplify the proof of theorem 4.3.5 
in [3]. As in 0], theorem 13. II is sufficient to prove asymptotic freeness almost 
everywhere for the family 

{{{U{s, n), U{s, n)*}),g5 , {D{t, n),D{t, nf : t & T]) 

when the Dr{n) is known to have a limit distribution. It will also be useful 
to us that theorem 13.11 gives us bounds also in cases where the Dr{n) do not 
converge to a limit. The Dr{n) model in our case concerns -^Rn random 
matrices, for which it is not known whether an almost sure limit exists (only 
that r„ = j^RnRn has an almost sure limit). Also, theorem 13.11 gives us 
grounds for proving that only the lower mixed moments converge to zero. It 
can be applied to cases where only the lower || • ||p-norms are known to be 
bounded, in which only lower mixed moments can be bounded. 

What we really want is to use random matrices Rn independent from the 
Un instead of the deterministic matrices Dr{n). This is addressed by the 
following theorem. We restrict to the case of one standard unitary random 
matrix. 

Theorem 3.2. Let Un be n x n standard unitary random matrices, and 
let Rn be random matrices independent from Un, such that RnRn converges 
in distribution almost surely to a compactly supported probability measure p. 
Then 

(3.2) \trn (C/r A(i?n)f/r^'2(iin) " " " K'^Pi{Rn)) \ ^ a.S. 

uniformly for any choice of polynomials Pi, Pi such that trn{Pi{Rn)) = 
for all 1 < i < I. 

The proof is in section [3T2l As for theorem 13. H theorem 13.21 is sufficient to 
prove asymptotic freeness almost everywhere for the family (C/„,i2„) when 
the Rn are additionally known to have a limit distribution. 
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The proof is split in two: First (13. 2p is shown for random matrices sat- 
isfying bounds of the form ||i?n||p ^ Rp {p ^ !)• The proof in this case 
uses theorem 13.11 and is quite short. The more general case of compactly 
supported probability measures is proved with an approximation argument. 

The next step is to pass from standard unitary random matrices C/„ to 
standard Gaussian random matrices X„. Note that it could be possible to 
skip starting with standard unitary random matrices altogether, by building 
directly on results for almost sure convergence of Gaussian random matri- 
ces like those in (l5|. We have chosen the approach with standard unitary 
random matrices for compatibility with We will prove the following: 

Theorem 3.3. LetXn benxn standard Gaussian random matrices, and 
let Rn be random matrices independent from Xn, such that RnRn converges 
in distribution almost surely to a compactly supported probability measure p. 
Then 

\trn {Ql{Xn)Pl{Rn)Q2{Xn)P2{Rn) ' ' ' Ql{Xn)Pl{Rn)) \ ^ a.S. 

uniformly for any choice of polynomials Qq,---,Qi, Pi,..., Pi such that 
trn{Qi{Rn)) = and trn{Pi{Rn)) = 

for all 1 < i < I. 

The proof is quite short, and also presented in section 13.21 Note that 
the approximation argument used in the proof of theorem 4.3.5 in [7] does 
not work in this case. As for theorem 13.21 theorem 13.31 is enough to prove 
asymptotic freeness almost everywhere when the Rn are additionally known 
to have a limit distribution. Just as theorem 13.11 gives bounds for mixed 
moments also in cases where the deterministic matrices do not converge in 
distribution, theorem 13. 21 and it's counterpart for Gaussian random matrices 
can be used to bound mixed moments in cases where it is only known that 
the Rn matrices satisfy || • ||p-norm bounds. 

To finish the proof we will model our situation through the following 
theorem, which is stated independently of a random matrix setting. 

Theorem 3.4. Suppose that a and {p,b} are *-free, with a R-diagonal 
andp a projection with (j){p) = c. In the reduced probability space {pAp, (j){p)^^(j)), 
l^p{a+b){a+b)*p Uniquely identified by Ppaa*p dnd fipbb*p through the equation 

(3-3) Pp(a+b)(a+b)*p^lJ'C = {l^paa*p^P'c) H {Ppbb*p^P'c) 

In particular, Pp[a+b){a+bY p ^^.s no dependence on mixed moments of a and 
b. 
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This will be proved in section 13.31 Note that there is no assumption on 
freeness between p and b. The case c = 1 is particularly interesting, and 
corresponds to p = /. In this case, /^(a+6)(a+6)* is uniquely identified by Haa* 
and fiijh* , and (j3.3|) is simply 

(3.4) fJ-{a+b){a+b)* H/"l = (Paa* H/Ul) H (Pbb* H/Ul) • 

This equation has an interpretation in terms of square random matrices. 

Due to (|3.3|) . i?-diagonality relieves us from dependencies of many mixed 
moments, so that some cancellation phenomenon must occur. This also hap- 
pens in other cases. If a and b are free, ^] expresses the distribution of the 
free commutator, i.e. 

(3.5) i?,,.,.,.) (z) = 2 {RZ^'mZ'^Zeta) {z% 

where i^^ven^^) ^ E^=i "2^-2" whenever R{z) = E^=ia„z". ([33]) holds 
also when a and b are not i?-diagonal. (|3.5|) also expresses a connection with 
multiplicative free convolution with /xi, since boxed convolution with the 
Zeto-series is involved. 

Theorem 13 . 41 has a more general flavour than theorem l4.lt since the limits 
j^XnX* from theorem 14.11 do not include all i?-diagonal pairs. The follow- 
ing limiting version of theorem 13.41 will be useful in finishing the proof of 
theorem 14.11 

Theorem 3.5. Let the random variables {a„, bn,Pn} G (-4.„, </>„), {a,p} S 
{A,(j)) be given, where a is R-diagonal andp,pn are projections with (j){p) = 
<Pn{Pn) = c. Form the random variable paa*p in {pAp,(j){p)^^(l)), and the ran- 
dom variables PnbnbnPn andpn{an + bn){an + bn)*Pn in iPnAnPn,(l>iPn)~^4>n)- 

If 

and 

I^Pnbnb'^Pn *■ 

in distribution, moments are uniformly bounded in n, and mixed moments 
of {an,{Pn,b„}) go to 0, then fj.p^^a^+h„){ar,+b„)*pn converges in distribution 
and the limit is uniquely identified by the equation 

(3-6) Jra^ l^Pn{a„+b„){an+bn)*Pn ^l^c = {l^paa* p^ f^c) ffl (^H/^c) 

Proof. The limiting moments of /Up„(a„+6„)(a„+b„)*Pn S/U^ do not change 
if we "zero out" the mentioned mixed moments (i.e. that we assume freeness 
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of {an,{Pmbn})), due to the assumption on their vanishing and of uniform 
boundedness on moments. It is also easily seen that the limiting moments 
do not change if we change the distribution of a„ to fia„ = fJ-a for ah n. But 
then 

where we used theorem I3.4[ so that (13.61) holds. □ 



The rest of the proof of theorem 14. II now goes as follows: The rectangular 
random matrices Rn can be viewed as the N x N random matrices PnSn, 
where the projection p„ is a diagonal constant matrix, with the fraction of 
I's on the diagonal equal to c, and Sn is an extension of the n x N matrix 
Rn to an N X A^-matrix, obtained by adding zeros. Similarly, the random 
matrices Xn can be viewed as the N x A^-matrices PnYn, where Yn is an 
extension of the n x N matrix Xn to an N x A^-matrix, obtained by adding 
more independent standard Gaussian entries. 

Since -^SnS* almost surely converges to a compactly supported probabil- 
ity measure, {-^Yn, -^Sn) satisfies the requirements of theorem 13.31 Thus, 

mixed moments of -^Xn and -^Sn go to zero almost surely. It is also seen 
that -^Sn has it's moments bounded as n ^ oo almost surely. It is well 
known [7| that -^Yn converges in distribution almost surely to the circular 
law, which is i?-diagonal. 

Thus, all assumptions of theorem 13.51 are satisfied for a„ = '^^Yn, bn = 

■:^Sn and pn, almost surely. Thus, almost surely, 

J™o/^P„i(5„+ay„){s„+<xy„)*p„s^c = (A*p<x2iyy.pH/^c) S (^rS/^c) 

or 

which is the statement of theorem 14.11 

The proof as skecthed here assumes c < 1. An explanation for how the 
proof goes for c > 1 is given succeeding the proof of theorem [37 
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3. 1 . The proof of theorem \3.1\ The proof will use the (generalized) Holder 
inequality: 



k ^ 



Lemma 3.1. For matrices Ai, ...,Ak, the following holds: 

k 

■ ■■AkWp < Pillpi • • • Pfcllpfe when ^ 

i=i Pi P 

In the proof of lemma 4.3.2 in [7,], ()3.ip is written as 

[-) E E ( n^iM.)^'=w+i(*-''^) 

jl,...,i2fc = l Jfc(i),...,ifc(;)Jfc(i + i) + i,.--iifc(2i) + l 1 ^ 

(21 \ / 2k \ 

n 4.MA.M+i(*'-,n) n ^iHhi4h),eih),n) , 

r=/+l / \/i=l / 

where for 1 < r < I, 

k{r) = \mi\ + • • • + \mr\, 
k = k{l), k{l + r) = k + k{r), ti^r = ir- 
Moreover, for h such that k{r — 1) + 1 < h < k{r), 



1 if nij. > 
— 1 if nir < 



s{h) = Sr, e{h) = 
Here s{h + k) = s{h) and e{k + h) = —s{h) ior 1 < h < k, and 



, \ _ j Uij{s,n) if e = 1 
Uij[s,e,n) - < if e = -1 



Since (|3.7p is a matrix product written out, the following must hold: 
(3.8) 



jh = ih+i for h e {1, k} \ {k{l), k{l)}, 
h = jh+i for h£{k + l, 2k} \ {k{l + 1), k{2l)} 



Also, due to the vanishing of many mixed moments of entries in standard 
unitary random matrices (lemma 4.2.2 in [7]), two pair partitions U and V 
can be chosen so that if {h, h'} £ U then 

(3.9) sih) = sih'), e{h) = 1, e{h!) = -1, 4 = jh'{= ih' + 1), 
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and if {h, h'} G V then 

(3.10) s{h) = s{h'), £{h) = -1, e{h') = l,ih= jh'i= h' + 1) 

These two pair partitions and (|3.8|) cause many equahties among the zi, Z2fc, 
and define the equivalence relation 1Z{U,V) on {l,...,2/c} so that = iw 
whenever h and h' are in the same equivalence class of TZ{U,V). We let ko 
denote the number of equivalence classes of 7^(^, V), and let h{l), h{ko) 
be representatives from the equivalence classes. 
Recall the expressions 

(3.11) Cn{Ll,.:,tko) = (l[dj^^r)ik(r)+iitr,n)\ I n 4.MifcM+i(*r,n) I , 

\r=l / \r=Z+l / 



/ 2k 

(3.12) Qnil^l,--,t'ko) = ^ n '^ihjhi^W^^W^^) 

\h=l 

from 0], where (ti, t^p) in (j3.1ip are defined as {ih{i), •••) ^h(fco)) repre- 
sentatives of the equivalence classes), and ji, .■.,j2k are determined subject 
to U,V. (4.3.6) of [7| says that it is enough to prove that for any partition 
W of {1, kf)} we have that 

(3.13) C„(ii, tfc(,)Q„(ti, tfcj = 0(1) as n ^ CO, 

(tl,...,tfc(,):W 

where the summation is over (ii,...,/,^^) such that Lp = iq if and only if 
p and q are in the same block of W. For a given W, it is known that 
|(5n('-i, '-fco)! = 0{n~'^) uniformly for ik,jk as n — > oo, and that it has the 
same value for all (ti, t^p) taking part in the sum (j3.13p . So, from (j3.14p 
we deduce that it is enough to show that for any choice of Z^/, V, W (there is 
a finite number of such choices), 

(3.14) Cn{il,...,lk,) 

(n,...,ifcQ):W 

IS O(n^). In0 this is proved using the fact that ()3.1ip are bounded uniformly. 
This is not true in our case since only uniform boundedness in || • ||p is 
assumed. Instead, we will group sums of terms into matrix multiplication 
units, and use the Holder inequality together with the || • ||p-norm bounds. 
Instead of the terms in (13.141) . where the sum is over 

(ii, Lko)'- '•p = '-g if and only if p and q are in the same block of W, 
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it will be better for us to sum over 

(ti, ifcg): ip = iq if p and q are in the same block of W. 

The latter set is more compatible with indices in multiplications of many 
matrices. This second set is larger than the first, and can be written as 

(3.15) E Cn{iU...,ik,) 
It is obvious that (j3.14p can be written 

(3.16) E E E Cn{il,...,Lko), 
W'>W >V">W (ti,...,tfcQ):W" 

where ayyi are integer constants which can easily be calculated (proving 
()3.16p boils down to splitting all values of ti, i2 into those where ii = i2, and 
those where i\ i2- This is done recursively and for all ii to yield (|3.16p ). 
Since there is a finite number of elements in the two outer sums in ()3.16p . 
to prove that (j3.14p is 0{n^) it is enough to show that (j3.15p is 0{n^) for 
any choice of W. 

Let Iq denote the number of equivalence classes in TZ{U, V) with only one 
entry, and let h{l), /i(/o) be the corresponding respresentatives. According 
to [7|, equivalence classes with only one entry give rise to factors of the 
form dt-t- in (I3.1ip . where d is either d^^^^{tr,n) or di,.L^{tr,n) for some r. 
Equivalence classes with only one entry thus leads (through summation over 
one Li appearing in just one factor) to factors in (13.15P which are (non- 
normalized) traces of the D{tr,n). These are zero, so we can assume that 

= when we attempt to bound (I3.15p . Had we used the sum (I3.14p instead 
of ()3.15p . we would not obtain zero. 

So we assume that there are no singleton equivalence classes, i.e. ko < k. 
Let Kq be the number of equivalence classes actually appearing in (13. lip 
(this is a function of U and V). we have that Kq < ko, but equality does 
not necessarily hold. We will use matrix units Eij (i.e. Eij{i,j) = 5ij). By 
placing matrix units Fi with indices from t^p in between the terms in 

(j3.1ip . (|3.15p can be written as 

(3.17) E ntrJl[{F,D,)Y 

W'>>V{ti,...,tfcg):W' \i=l / 

where Di are matrices from D{tr,n) or one of their transposes/conjugates. 
Since = n p and the number of possible choices of matrix units is 
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n^°, lemma \3A\ implies that ()3.17p is bounded by 

21 21 

(3.18) \\Di\Ui = J] \\Di\Ui. 

1=1 i=l 

Since Kq < k, this is 0{n^) except possibly in the case when Kq = /c, i.e. 
when all equivalence classes have exactly two elements. 

So, for the rest of the proof, we assume that all equivalence classes have 
exactly two elements. Note that the number of times an equivalence class 
appears as an i is equal to the number of the times the same class appears 
as a J in (I3.11|) . This is obvious from the way the equivalence relation is 
defined ^Mi, ^Mi in order to avoid a zero value in (|3T2]) . This 

means that we can take the first of the Kq equivalence classes appearing 
in (jS.lip . and rearrange the terms in (j3.1ip so that the equivalence class 
appear in alternating order as an i and as a j. (j3.17p can thus be rewritten 
to 

(3.19) E ntrn{FiDiGiD2F2D^G2D^F^) , 

where Di , D2 , -D3 , -D4 are the matrices where the first equivalence class ap- 
pear as an i or a j, and in alternating order. Also, Fi = F2 = F^ = E^r 
are matrix units, r is a given number, and the Gi are products man of the 
matrices Di in ()3.17p . (j3.19p can also be written 

(3.20) J2 E ntrnidiag{DiGiD2)diagiD3G2D4)), 

W'>>V(n,...,ifc„):n" 

where diag{A) stands for the diagonal of the matrix A. Similarly to the 
calculation of the bound (j3.18p . (j3.20p is seen to be bounded by 

(3.21) n'''^\\diag{DiGiD2)\\2\\diag{DsG2Di)\\2. 
Note that ||ciia<^(A)||2 < ||^||2i since 

\\diag{A)\\l = ^ E - ^ E l%f = trn{A*A) = \\A\\l 
where A = {aij)ij. This means that (j3.2ip is bounded by 

111)1011)2112^3^21)4112, 

which is 0{n^°) and hence 0{n^) since all Di are bounded in p-norm, and 
the only other factors are matrix units, which have p-norm n v . 

That there exists a "pi for a given / as in the last statement of the theorem 
is obvious from the proof and the way the Holder inequality was used. This 
completes the proof. 
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3.2. The proofs of theorem \3.S\ and \3.S\ . First assume that Rn satisfies 
ll-Rnllp 1^ Rp {p ^ 1) almost surely for some constants Rp. Pi{Rn) satisfies 
similar || • ||p-norm bounds due to lemma 13. 1[ Call the underlying prob- 
ability space O. Denote by fu„,RniU, R) the joint density of Rn and Un, 
and by funi^) fR„iR) the marginal densities. Due to independence, 



fu„,RAU,R) = fuSU)fRAR): and therefore 
(3.22) 

= /c \trn (UrPliRn) ■ ■ ■ f/r^K^n)) \^ds 

= hue) hue) \t^n (f/™^Pi(i?) • • • U'^'PiiR)) \^fu„,R„iU, R)dUdR 
= Sm^c) lM„ic) \trn {U^^Pi{R) ■ ■ ■ U^^Pi{R)) \'fuAU)dUfR„{R)dR 
< f^^^c)Cn-''fRAR)dR = Cn-^ 



where we have used the bounds for deterministic matrices from theorem 13.11 
Therefore 



for such random matrices Rn- If RnRn is just known to converge in distribu- 
tion almost surely to a compactly supported probability measure, observe 
that almost surely there exists a value R so that ||i?n||p ^ R for n large 
enough For each choose pi as in the statement of theorem 13.11 Denote 
by f^p;,Ar, p > 1 N £ N the subset of Q determined by values s such that 



for all n. Define Rn,pi,N = Xfip^ nRn with x denoting the characteristic 
function. The Rn,pi,N satisfy the estimates (|3.22p for mixed moments of 
length < /, so that these mixed moments go to zero almost surely in ^pi,N- 
^Pi,N^Pi,N has probability 1: Almost surely, the || • ||p,-norm of Rn stays 
bounded by some finite value for large enough re. Thus, for every s in a 
set with probability one, we can find a value Ng such that ||i?n(s)||p; < Ns 
for ALL re. But then s £ ^lp^^J\fs^ so that Upi^]\f^pi,N has probability 1 as 
claimed. Since UatJIpj^at has probability 1, theorem 13.21 follows from the fact 
that Rn = Rn,pi,N on ^pi^N- By increasing I we get almost sure convergence 
of higher mixed moments to zero also. 
Now for theorem 13.31 Write 



trn ([/r A(i?n)^7r^'2(i?n) " " " PliRn)) 



a.s. 



(3.23) 



Rn{s)\\p,<N 



Xn — UnA.nU* 



imsart-aap ver. 2007/01/24 file: multfreeconv.tex date: February 2, 2008 



24 



0YVIND RYAN ET AL. 



for a unitary random matrix C/„, and diagonal random matrix A„. We may 
assume that Un is a standard unitary random matrix, as in the proof of 
theorem 4.3.5 of [3], since Gaussian random matrices are unitarily invariant. 
We can also assume that is independent from Un, so that {Un, {A„, Rn}) 
is an independent family. i?„ converges to a limit which is compactly sup- 
ported, and An does the same. Since (j3.22p can be easily generalized to the 
case where the Rn are replaced with many different R^ (with the {Rn} all 
independent from Un) we conclude also for theorem 13.31 that we get almost 
sure convergence to zero of mixed moments as in definition 12.11 

3.3. The proof of theorem [gl^l First write </> {{p{a + b){a + hYp)"^) as a 
sum of mixed moments of length 3m by multiplying out {p{a + h){a + b)*p)'^: 

(3.24) </.((p(a + 6)(a + 6)*pD = ^ (xix^p • • • X2™_ix*^p) , 

where a = {1, 2, 4, 5, ••• 3m - 2, 3m - 1} ({1, 2, 4, 5, ••• 3m - 2, 3m - 1} 
correspond to the indices of the locations of the Xj, x* in the moments 
xix*2p- ■ ■ X2m-iX2mP) ■• runs over all subsets of a, and Xi = a if i £ ai, 
Xj = 6 if i G 0" \ (Ti . Denote by | fJi | the cardinality of ai . We denote by a 
the cumulants of Ha,a* and (3 the cumulants of lJ'b,b*,p, so that the moment- 
cumulant formula for (a, a*) is 

k 

(3.25) (p {xi^ . . . XiJ = Yj Y[[coef{ii,...,in)\Bi]{Ri,^^^,) 

7r={Bi,... ,Bk}&NC{n) i=l 

with xi = a, X2 = a* and i = 1 or 2, and the moment-cumulant formula for 

{b,h*,p) is 

k 

(3.26) (j){xi^...x^J = \{[coef{h,...,in)\B^]{R^^^^.,J, 

7r={Bi,... ,Bk}<^NC(n) i=l 

with xi = b, X2 = b* , x-i = p and i = 1, 2 or 3. We will use the shorthand 
notation 

as, = [coef{ii,...,in)\Bi]{Rf,^^^,) 
Pb, = [coef{ii,...,in)\Bi]{Rf,^^^,J 

Due to the freeness of a and {p,b}, the moment-cumulant formula applied 
to all moments in (I3.24P and (j2.14p yields 

(3.27) EE E «-i/5-2' 

cri<(T ,rigJVC{|cTi|) TTjeiVCdo-JI) 

'^i-'^i 71-2 <o- J, no crossings between tti and tt2 
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where = {l,...,3m} \ ai and |(Tf| is the cardinahty of erf. vri divides 
{1, |(Ti|} into |i^(7ri)| sets (see definition 12.31 and figured]) when tti is 
viewed as an element in NC{\ai\). vri also divides {1, ...,3m} into the same 
number of sets, according to the circular representation of {!,..., 3m}. Let 
us denote these blocks by -Bi, • • • Bk, so that erf = {Bi, • • • , Bk} as a sub- 
partition of l |g-c|. Since vri and tt2 have no crossings if and only if tt2 < 
{Bi, -Bfc}, (|3.27p can be written as 

(3.28) EE E 

ai<a 7rieNC{\tTi\) 7r2eiVC{|tTj|) 

T1<<^1 7r2<{Bi,-,Bfc} 

When 7r2 < {-Bi, ...,-Bfc} we can write Bi = 7r2ii U 7r2i2, • • • , where the 7r2ij 
are the reindexed blocks of tt2 which are contained in Bi, and where TT2i = 
{'^2ii,'^2i2,- This is in NC{\Bi\) since 7r2 is noncrossing. First rewrite 
([3:^8]) to 

(3.29) E E n 

<Tl<(T,riSiVC{|<Ti|) y7r2,eAfC(|Bi|) j=l / 

7ri<cri 

Then note that the ■K2i G NC{\Bi\) can be summed independently of one 
another, so that we can rewrite to 



(3.30) E "-ill E 

cri<cr,riGiVC{|cTi|) i=l \7r2iG A^Cd | ) 

7ri<cri 

Note also that only vri with blocks 

where Xi^^_^, Xi^^ are alternating values of a and a*, give contribution 
in (I3.30p . due to i?-diagonality of a. Hold such a vri fixed in ()3.30p . and 
take a look at the inner sum in (|3.30p for a given i. This is simply the 
moment-cumulant formula (13.26P for a moment of length \Bi\, where the 
mixed moment is on the form 

pbb*pbb*p ■ ■ ■ bb*p, 

or on the form 

b*pbb*pbb*p ■ ■ ■ bb*pb 
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due to the alternating structure in (j3.24p . In both cases the moment-cumulant 
formula yields (f) (^{pbb* p)^2^^ for the inner sum in ()3.30p . Therefore, we get 
that (|3.30p equals 

(3.31) E o.n,f[4>({pbb*p)'-^). 

CT1<CT TrjGiVCdCTil) i=l 
7ri<CTl 

Since a is i?-diagonal, the Ot^-^ which give contribution in (j3.3ip are uniquely 
identified by the moments (j){{aa*)"^). Therefore, the moments 

</)((p(a + 6)(a + 6)*p)-) 

are entirely identified by the moments (j){{aa*)'^) and (p{{pbb* p)"^) , so that 
Mp{a+6)(a+6)*p Only depends on ^aa* and Upbb'p- All distributions are here in 
the noncommutative probability space (^, </>), not yet in the reduced space 
{pAp, (/){p)~^(j)). If we can prove the theorem when p and b are free, it will 
also hold when p and b are not free since tJ'p(a+b){a+b)*p only depends on Haa* 
and fipbb*p- 

We can replace fj,b,b* with the (unique) i?-diagonal pair 69 so that fibb* = 
/ibgfe*. So, we assume that both a and b give rise to free i?-diagonal pairs. 
Their determining series are R^^^^^Moeb and R^^^^^Moeb, respectively. 
(a+6, (a+b)*) is also an i?-diagonal pair, with determining series R^^^,^Moeb+ 
Rij,^^^^Moeb. This means that 

(3.32) R,,.+,,,.+,r^Moeb = R.^M^oeb + R,^^,^Moeb. 

If x is free fromp, we next calculate Rfi^^^,^^ in the reduced space {pAp, (j){p)~^(j)). 
We call this Rf/^^ in the rest of the proof, with similar notation for the mo- 

r''pxx*p 

ment series (In the rest of the paper, this notation is dropped since pxx*p is 
assumed to be in (pAp,(t>(p)~^cl))). Note that MP-^p = -Mn * . We have 

jlpAp ^ j^pAp ^Moeb =(-Ma A RMoe6 

^^pxx*p H-pxx*pl—l \C f^pxx'p J LJ 

= f z(^m...0^mJ) EMoeb = (i {R,^^,^{cZeta))) ^oeb 

= lHR>^xxM^Zeta)))^{l{cMoeb)) 

= c-"-i {Rf,^^,^{cZeta)^{cMoeb)) 

= c— 1 {R,^Mc-^'ld)) = c— 1 {R,^M^^m 

= c---~Uc'-R,^^,)=c--'R,^^.. 

For a general Marchenko Pastur law /i^, R^^ = d'^^^Zeta, and it is easily 
verified that R~^ = d^~~^Moeb. We have that 

Kt*P^R,' = (^""'^M..O Hd--'Moeb) = c-d- ((c-'R,^^,)g{d-'Moeb)) 
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if c = d, this can be simplified to 

(3.33) c"c"c-"-i (i?^_.gMoe6) = c""i {R^^^,gMoeb) . 

Using this for x = a, x = b and x = a + b, and also using (j3.32p . we get 

Using (|2.17p . this can be written 
which can equivalently be stated as 

which is what we had to prove. 

Note that if c 7^ d, Rfj,^^,^Moeb does not appear as a factor in p.33p . 
Therefore, there is no reason why the result should hold for other Marchenko 
Pastur laws than ^c, since (13.32j) can not be used in such cases. 

Although the proof of theorem 1 1.1 1 is described for c < 1 only, the methods 
used in the proof of theorem 13.41 here can help us prove the case for c > 1 
also. If Rn and X„ are the random matrices from theorem 11.11 and c > 1, 
note that 



1 , , 1 



^J■1_ 



where c"^f denoted the power series defined by [coefk]{c"^f) = c'^[coe/fc](/). 

From this one can show that 

(3.34) 



-m— -1 



^_rn-ij^j ^ ]mioeMc"'~^Moeb) 



(c-™)M^^^^^jg(c-Moe6)j j %Moeb 
c-"^-i(M^,^,^ gMoe6)^ g(c-i(c-'"+iMoe6)) 

\ -RnRn'—' '-' / 
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Since theorem 11.11 has been proved for c < 1, R-R/, ! wUl 

converge to 



as n — > oo (c > 1), and the result for c > 1 follows from (j3.34p . 

4. Equivalence with known expressions for limit distributions of 
Information-Plus-Noise Type Matrices. [M] studies systems where 
the sample covariance matrix is formed by taking independent samples of a 
system of the form 



(the noise factor a does not appear in [ij]) where Xn and Wn are independent 



standard (zero mean, unit variance) Gaussian random vectors, and An is an 
n X L matrix. The covariance matrix of the system is @n = -^n^n + o^^I- In 
particular, when there is no noise (i.e. o" = 0), the covariance is 0„ = AnA^- 



Denote by /xe the limiting eigenvalue distribution of AnA^- [IJ] states that 
the limiting eigenvalue distribution of the sample covariance matrix of the 
system is {^q ffl IJ-a'^i) ^ IJ-c- When there is no noise, the limit is K1 Hc- 
This way of passing from 

fie M fic to (ne ffl fJ-a^i) Kl fic 

is of course compatible with theorem 11.11 We will also show that it is equiv- 
alent with the results in The following restrictions taken from will be 
used: 

1. For n = 1,2,- •• ,, X„ = (X^j, nx N, i.d. for all i,j,n, independent 
across i,j for each n, and E\Xl-^ — EXl-^\'^ = 1 

2. i2„ is n X iV and independent of with F^'^r^ ^ i?Mr 

Theorem 1.1 in ^ expresses a relationship for finding the limiting eigen- 
value distribution fiw of Wn from that of Tn = j^RnRn (denoted fir)- More 
precisely, under the conditions 1) and 2), we have that (in a slightly rewrit- 
ten form) F^^^^ pi^w almost surely, where F^^^ is a nonrandom p.d.f. 
characterized by 



(4.1) 



dF^'^ (t) 



t 



(1 + a'^cm^^{z))z + 0-2(1 - c) 



for any z G . The connection with multiplicative free convolution is hard 
to see from this formula. To obtain this connection, the following lemma is 
needed, which will be proved in section BTH 
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Lemma 4.1. J^. j| ; is equivalent to 

(4.2) m-^ { i^a^cz ) = " ^'^^)'"^;r (^) + ^'(1 - c)(l - ^'cz), 
/or z in some interval {0,zi). 



In ()4.2p and all other places where the inverse of the Stieltjes transform is 
taken in this paper, we will mean the unique inverse on the negative real line. 
The inverse will only be calculated for positive values close to 0. It will turn 
out that (j4.2p can be more conveniently expressed in terms of distributions 
obtained from multiplicative free deconvolution with the Marchenko Pastur 
law using the following lemma, which will be proved in section 14. 2t 

Lemma 4.2. // 

(4.3) fir = fJ-e^ /^c 
then, for z in some interval (0,zi), 

(4.4) ,-Jiz)- ^'^-'^'^ 
and also 



^r^"' l-C + cz 



(4.5) < =<^^)il-c-czm~l{z)). 

Using (j4.5p , the following relationship with multiplicative free convolution 
will be shown: 

Theorem 4.1. Under the conditions 1) and 2), assume that 

(4.6) F^"^- ^ pf^e^f^c 
Then 

(4.7) _pMvK„ ^ ^(/^efflM<,2j)K/ic ^ 
Equivalently, assume that 

(4.8) F^'^" ^ F'^r 
Then 

(4.9) F/^w^" ^ a.s. 
where fiw is uniquely identified by the equation 

(4.10) /iw' H/Uc = (^r H/Uc) ffl IJ-a^i, 
Theorem 14. II will be proved in section 
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4.1. The proof of lemma Rewritten in terms of the Stieltjes trans- 

form, (j4.ip says that (with terms somewhat regrouped) 



(^(1 + cr^cm^^)^z - (7^(1 - c)(l + (T^cm^ 



where , m^p are evaluated in z when the parameter is omitted. We re- 
strict ourselves to z on the negative real line. The relation holds also for such 
z, since we can analytically continue to the negative real line. Evaluating in 
mjlli^) we get 

1 + a^czfm~l{z) -a\l- c)(l + a^cz] 



1 + CF^CZ 



for z in some interval (0,2;i). We will find it convenient to work with the 
inverse of the Stieltjes transform, so we rewrite the expression to 

"i^r (tT^) = + <T^czfm-l{z) -a\l- c)(l + a^cz). 

Subsituting u = i^^-2^^ (or equivalently z = jz^r^) (this is an isomorphism 
of the positive real axis which sends to 0), we get 

_i, , ^ "^J^w {t^) _ aHl - c) 

for z in some interval (0, zi), so that 

= (1 - <^'cz)^m-^{z) +a\l- c)(l - a'cz), 

which is (lOI). 



4.2. The proof of lemma By the multiplicative property of the S- 

transform we have 

Expressed in terms of the //-transform this can be written 

V = 1 " — ' 
1 — c + cz 

which is (j4.4p . Evaluating in ri^^_^{z) and applying r/^^ on both sides gives 

^A'r r- TT = ^/^e (^) 

\^l-c + cr/^e(z)y 
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for 2; > 0. This can also be expressed in terms of Stieltjes transforms as 

Z J, 

l-c+cr;^e(z) 

Regrouping terms and substituting — ^ for 2; we get 

m^j, z 1 - c - czm^e(z ) = : — 

1 - c - czm^g, (z) 



for z < 0. Substituting m^^{z) for z and taking the inverse Stieltjes trans- 
form m^j^ we get 

\ 1 - c - czm^e [z) j 
for z in some interval (0, zi), which is (j4.5p . 



4.3. r/ie proo/ 0/ theorem \4.1\ Note that if 

^1 



1 - c - czim^Q(zi) 

for zi positive and close to 0, then we have 
z Zi 



(4.11) 



1 — a^cz 



1 - c - czi (m^i(zi) +0-2^ 



Note also that ^_^-2^^ > as long as z < Substituting (j4.1ip and (j4.5 

in (1321) we get m;^ (izf^) = 
(4.12) 



- c) 1 - , 



(l-c-c2im^i^ {zi)"0-2c2i)^mpQ (2;i)+(72 (l-c)(l-c-c2im^Q (2i)-(t2c2;i) 
l-c-czim;;^ (zi) 

{l-c-czim~^ {zi)-a'^ czi)(^(l-c-czim~^{zi)-cr'^ czi)m^l, (■2i)+o'^(l-c)) 

l-c-czim~Q(zi) 
(l-c-czi(m~Q (zi)+o-^)) (l-c-czim~Q (zi)) (m^^Q (zQ+Q-^) 
l-c-c2im;;Q(zi) 

i(zi)+a2) (l_c-czi(m;i(zi)+a2)^ 



m,, 
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Here we have used that rn'^^^ ^ (^i) = ™'/Ie('^i) + which fohows from 
the additivity property of the ^-transform and the fact that the inverse of 
the Stieltjes transform is used to define the iZ-transform (perform additive 
free convolution with ^0-2/)- (14.121) is thus nothing else than ()4.5p (with 
m^Q replaced by m^QE^^jj)- Since (|4.5p is just an equivalent expression for 
multiplicative free convolution, we therefore have 

IJ'W = (Me ffl l^a^l) ^ l^c, 

or equivalently 

This completes the proof. 

5. Using G-analysis to estimate the spectral function of covari- 
ance matrices. It turns out that multiplicative free deconvolution can 
also be used to estimate covariance matrices. The general statistical anal- 
ysis of observations, also called G- analysis [H| is a mathematical theory for 
complex systems where the number of parameters of the underlying mathe- 
matical model increase together with the growth of the number of observa- 
tions of the system. The mathematical models which approach the system 
in some sense are called G -estimators. The main difficulty in G-analysis is 
to find good G-estimators. G-estimators have already shown their usefulness 
in many applications 0]. We denote by N the number of observations of the 
system, and by n the number of parameters of the mathematical model. The 
condition used in G-analysis expressing the growth of the number of obser- 
vations vs. the number of parameters in the mathematical model, is called 
the G-condition. The G-condition used throughout this paper is (|1.2p . 

Girko restricts to systems where a number of independent random vector 
observations are taken, and where the random vectors have identical dis- 
tributions. If a random vector r„ has length n, we will let 0„ denote it's 
covariance, while T^ will still denote sample covariance matrices. The 
we analyze in this section are more restrictive than in previous sections, 
since independence across samples is assumed. Girko calls estimators for the 
Stieltjes transform of covariance matrices G'^ -estimators. In chapter 2.1 of 
he introduces the following expression as candidate for a G^-estimator: 

(5.1) Gliz) = ^-^m,^J{z)), 
where the function 9(z) is the solution to the equation 

(5.2) mcm,^„ (e{z)) - (1 - c) + ^ = 0. 
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Girko claims that a function G'^{z) satisfying (j5.2p and (15. Ih is a good ap- 
proximation for the Stieltjes transform of the covariance matrices w,e„ (z) = 
trn{&n — zln}~^ ■ More precisely, he shows that when (15. ip . (|5.2p and the 
G-condition (jl.2p are fulfilled, under certain conditions there exists a c > 
such that 



(5.3) lim sup G^{z)-me„iz) =0, 

"■^"^ o<c<a(z)<s 
|5R{^)|<T 

with probability one for every S > and T > 0. According to Girko, ana- 
lytical continuation of G'^{z) can be performed to obtain limits for other z 
than the ones in ()5.3p . 

As it turns out, the G^-estimator can equivalently be expressed in terms 
of multiplicative free convolution: 

Theorem 5.1. For the G"^ -estimator given by i5. 1]) . Ii5.^) . the following 
holds for real z < 0.- 

(5-4) Gl{z) = mf,^^^^^ 

Proof. ()5.2p can be rewritten to 



c\z J J e{z) 



which we will write 

,-1 (i ( e(z) 



(5.5) ^^-i ^ = e{z) 



[i (^^ - (1 - c) 

Denote by the measure with Stieltjes transform G'^{z). (j5.ip can be rewrit- 
ten using the ?7-transform as 

Since t?^^^ and rj^ are monotone, it is easily seen from this that 9 is mono- 
tone since it is a combination of monotone functions. Forming the inverse 
functions on both sides, and also applying 6, yields 
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Showing fiR^^ = fi^ He is equivalent to (after rearranging (14. 4p ) 

1 1 



Applying 9 on both sides and using (j5.6p yields that this is equivalent to 
(5.7, ' =^ ' ^ 



Observe now that ()5.7p and ()5.5p are related in the following way: If we 
substitute z = ^ ~ (1 ~ '^^^ (|5.7p . the argument on the right hand 

side can be rewritten using (j5.5p to 

1 



SO that (|5.7p is nothing but a restatement of (jS.Sp . at least on values of 
the form z = ^ (^^IIT ~ ~ '^)^ ■ these values take on an open set of 

real values, equality in (|5.7p follows for all z by analytic continuation. This 
happens when 9{z) ^ kz for some constant k. If 9{z) = kz, then rjfj_^^ is 
seen to be constant, which only happens in trivial cases. Thus we have that 
/^i?n = /i Kl and we are done □ 

Several remarks concerning theorem 15.11 are in place. First of all, the 
G^-estimator has a much shorter expression in terms of multiplicative free 
deconvolution, which also places it as an ingredient in theorem ll.il The theo- 
rem is nice to combine with continuity results for free convolution. Voiculescu 
has proved such results when convergence is in the weak-* topology [l[. This 
enables us in many cases to conclude that 

lim Gl{z) = lim m/,p ="iMrHMc 



for some probability measure /ir- Secondly, ij] expresses the exact same 
estimator, i.e. 

lim /ir„H/ic = lim fie„ 

n— +00 ra^oo 

in the case of Gaussian systems. Theorem 15.11 can be seen as a way of gen- 
eralizing from the Gaussian case. 
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6. Further work. The concept of freeness and free convolution can be 
extended to unbounded random variables and general probability measures. 
In [H it is shown how this can be done in the context of unbounded operator 
spaces, and certain regularity properties are proved. For instance, if /U„ ^ /i 
and f„ ^ in the weak-* topology with both ^ ^ 5q and i' ^ 6q, then 
fj-n^t^n — > ^Klz^ in the weak-* topology also. It is possible that applying such 
extensions together with the methods applied here can extend the results to 
the same generality as those in [3]. This may be addressed in a future paper. 

The G^-estimator is just one of many estimators introduced by Girko. He 
has estimators for many other quantities also [4], like for the square root 
and the moments of covariance matrices. Certain of these estimators may 
also have alternative expressions in terms of free probability constructs. 
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